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Section  I 

Summary  of  Program  for 
Reporting  Period 

Program  Objectives 

To  develop  practical,  low  cost,  real-time  methods  for 
suppressing  noise  which  has  been  acoustically  added  to 
speech . 

To  demonstrate  that  through  the  incorporation  of  the 
noise  suppression  methods , speech  can  be  effectively  analysed 
for  narrow  band  digital  transmission  in  practical  operating 
environments. 

Summary  of  Tasks  and  Results 

Introduction 

This  semi-annual  technical  report  describes  the  current, 
status  in  the  research  areas  for  the  period  1  April  1979 
through  30  September  1979. 
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SUPPRESSION  OF  ACOUSTIC  NOISE  TN  SPEFCP 
US  TNG  TWO  MTCROPHONE  ADAPTIVE  NOTSE  CANCELLATION 

Steven  F.  Bol 1 
Dennis  Pulsipher 
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adaptively  filtered  using  the  least  mean  squares,  LMS  and 
the  lattice  gradient  algorithms*  These  two  approaches  are 
developed  and  compared  in  terms  of  degree  of  noise  power 
reduction,  algorithm  convergence  time,  and  degree  of  speech 
enhancement.  Both  methods  were  shown  to  reduce  ambient 
noise  power  by  at.  least  ?PdB  with  minima1  speech  distortion 
and  thus  to  be  potentially  powerful  as  noise  suppression 
preprocessors  for  voice  common i eat i on  in  severe  noise 
env i ronment  s . 
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INTRODUCTION 


[  . 

It  has  been  shown  that  there  is  a  significant  reduction 
in  measured  speech  intelligibility  and  quality  due  to  the 
ambient  background  noise  generated  in  many  operating 
environments  [1],  [2].  A  number  of  single  microphone 
approaches  for  reducing  the  background  noise  added  to  speech 
have  been  developed  f3],  [4].  These  methods  become 
ineffective  when  the  noise  power  is  equal  to  or  greater  than 
the  signal  power  or  when  the  noise  characteristics,  e.g. 
mean,  variance  etc.,  change  rapidly  in  time.  This  paper 
studies  the  performance  of  an  approach  to  noise  suppression 
in  which  a  second  correlated  noise  source  is  recorded  and 
used  to  reduce  the  noise  added  to  the  speech.  This  second 
noise  source  is  adaptively  filtered  to  minimize  the  output 
power  between  the  two  microphone  signals.  This  approach 
generates  an  output  which  is  the  least  squares  estimate  of 
the  speech  waveform.  Two  adaptive  algorithms  used  to  filter 
the  correlated  noise  are  investigated;  the  LMS  approach, 
[5],  [6]  and  the  lattice  gradient  approach  [7],  [8],  T9]. 
Both  methods  approximate  the  least  squares,  Wiener  solution. 
The  LMS  algorithm  uses  the  method  of  steepest  descent  and 
approximates  the  ensemble  gradient  with  the  i nstantaneous 
gradient.  The  lattice  gradient  algorithm  uses  Newton's 
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method  in  an  orthogonal  basis  generated  by  the  lattice 
filter. 

A  severely  degraded  noisy  speech  signal  was  recorded 
for  testing  the  performance  of  each  method.  The  ambient 
energy  noise  was  amplified  to  mask  the  recorded  speech 
signal.  Both  approaches  are  compared  in  terms  of  degree  of 
noise  power  reduction,  algorithm  settling  time,  degree  of 
speech  enhancement,  and  computational  complexity. 

The  paper  is  divided  into  sections  which  develop  the 
two  adaptive  least  squares  estimators,  describe  the 
experiments  conducted,  and  demonstrate  the  algorithm 
per  fo  rraance . 

II.  TWO  MICROPHONE  SIGNAL  GENERATION  MODEL 

The  noise  suppression  experiments  were  based  on  the 
model  shown  in  Figure  1.  The  primary  signal  x(j)  consists 
of  the  common  noise  signal  n(j)  filtered  through  a 
transmission  channel  Gt  (z)  and  added  to  the  speech  signal 
s(j)  plus  another  independent  noise  signal  mj(j).  The 
reference  signal  v(j)  consists  of  the  common  noise  signal 
n(j)  filtered  through  a  transmission  channel  G  2(z),  added  to 


a  second  independent  noise  signal  m2(j).  The  signals  s(j), 
n(j)  t  n>!  (j)  and  m2  (  j)  are  assumed  independent  of  each  other 
and  GL  (z)  and  G2  (z)  are  assumed  constant  in  time, 

III.  THE  WIENER  SOLUTION  FOR  ACOUSTIC  NOISE  SUPPRESSION 
USING  TWO  INPUTS 

A  general  filtering  model  used  to  suppress  the  noise 
component  in  x(j)  which  is  correlated  with  v(j)  is  shown  in 
Figure  2.  As  is  discussed  in  [5],  minimizing  the  output- 
power  of  e(j),  minimizes  the  output  noise  power,  and  results 
in  e(j)  being  the  minimum  mean  square  estimate  of  s(j). 

The  tap  weights  of  the  all  zero  filter  W(z)  are 
computed  to  minimize  the  total  expected  output  power.  Using 
the  orthogonal  projection  theorem  tin],  E[e2(j)]  will  be 
minimized  when 

E [ e ( j+k ) v ( j ) ]  =  0  for  all  k. 

wh  er  e 

00 

e ( j  )  =  x(j)  -  l  w (i) v  ( j-i) 

i=-oo 


This  orthoganal ity  relation  results  in  the  Wiener-Hopf 
equation: 


i  k-i)  = 


RxvOO  for  all  k 


where 


R  (k)  =  E [v( j+k) v( j) ] 
vv 


R  (k)  =  E [v( j+k) x ( j ) ] 
xv 


The  z  transform,  W(z)  of  the  Wiener  filter  is  given  by: 


W  ( z)  =  Pxv(2) 
Pvv(z) 


where 


P*v(2>  \l  Rxvlk>2' 
k=-°° 


Pvv(2>  -  i  VIW!' 
k=-°° 
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i 


r 


For  the  signal  model  shown  in  Figure  1,  the  Wiener 
filter  reduces  to 


W(z) 


Pnn<z)Gi (z)  G2 (z) 


P 


m2m2 


(z) 


+  pnn(z)  lG2  (z)  1  2 


where 
n  ( j ) 


P 

nn 


(z) 


and  m2 ( j ) 


and  p 

in  2  ni  2 

respectively. 


(z) 


are  the  power  spectra  of 


The  output  power  spectrum  P  (z)  using  the  Wiener 

ee 

filter  is  given  by: 


Pee(z> 


Pss(z) 


+  Pm  _  (z) 

miini 


+  Pnn(z )  |GX  (z)  | 


1- 


Pnn(z)lGz(z)l 

PmA+l^'z>|!Vs> 


From  the  Wiener  filter  equation,  note  that  if  Pm2m2  (z) 
is  small  compared  to  Pnn  (z)|G2(z)|2  then 

W(z)  -  Gi  (  z )  G  2”  1  ( z ) 
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This  is  exactly  the  linear  system  required  to  transform 
v(j)  into  the  correlated  noise  component  which  was  added  to 
s(j).  Also  under  this  condition: 

P  (z)  =  P  <z)  +  P  (z) 

ee  ss  m1m1 

Thus  if  the  independent  noise  sources  m  (j)  and  m  (j)  are 

1  2 

negligible  with  respect  to  the  signal  s(j),  and  the  common 
noise  signal  n(j),  the  output  signal  e(j)  will  match  to  s(j) 
in  the  mean  squared  sense. 

IV.  MATRIX  FORM  OF  WIENER-HOPF  EQUATION  AND  THE  GRADIENT 
VECTOR 

Define  the  reference  signal  vector  V(j)  as 

V(j)=  fv(j-l)  v(  j-2)  .  .  .v(  j-N )  ]T 

and  the  filter  weight  vector  as 

T 

W  =  [  w  w  .  .  .  w  ] 

J  2  N 
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The  noise  cancelled  output  e(j)  is  qiven  in  vector  form  as: 


e(j)  =  x(j-l)  -  WTV(j)  =  x(j-l)-VT  (j)W 


The  mean  square  error  is  given  by: 


?  =  E  [e2  (  j)  1  =  E  [x2  (  j-1  )  ]  -  2PTW  +  WTRW 


where 


P  =  E  {x  ( j-1)  V  ( j  )}  = 


R  =  E  {V(j)VT(j)} 


x(j-l)  v(  j-1) 
x(j-l)v(j-2) 


x( j-1) v(j-N) 


v2 (j-1) 
v(j-2)v(j-l) 


v(j-l)  v(  j-2) 
v2  (j-2) 


v(j-N)v(j-l)  v  ( j-N)  v  (j-2) 


v(  j-1)  v(  j- 
v  ( j-2) v ( j- 


v2  (j-N) 


z  a 


The  optimal  weight  vector,  W*  o r thogonal i zes  the  error, 
e(j)  with  respect  to  the  reference  signal  vector  V(j),  thus 
the  optimate  estimate  of  W  satisfies: 


E  CV  ( j)  e  (  j)  ]  =  0 

then 

W*  E  [V  ( j )  VT  (  j )  ]  =  Efx(  j-1 )  V  ( j)  1 

or 

*  _  i 

W  =  R  P 

The  optimal  weight  vector  can  also  be  derived  by  first 
calculating  the  gradient  of  the  mean  square  error  surface 
and  using  the  value  which  forces  it  to  zero.  Thus  define: 


V  *  ||  -  -2P  +  2RW 


then  V  =  0  when  W  =  R~ 1 P 

The  minimum  mean  square  error  is  given  by: 


5  .  =  E[x2  (j-1)]  - 


-  n  - 


min 


T 

P  W 


The  mean  squared  error  £,  can  also  be  expressed  as: 


5  - 


«min  +  4M  Riw 


Where  AW  =  W-W* 

Taking  the  gradient  V  of  £  with  respect  to  W  gives 
an  alternative  expression  for  V  as 

V  =  2R  A  W 

V.  ITERATIVE  METHODS  FOR  ESTIMATING  THE  OPTIMAL  WEIGHT 
VECTOR 

A.  Method  of  Steepest  Decent  [ 5 ] ,  [6] 

Let  W ( j )  denote  an  estimate  of  the  optimal  weight 
vector  at  time  index  j.  The  gradient  V  (j)  evaluated  at 
W(j)  points  in  the  direction  of  greatest  rate  of  increase  in 
£  .  In  the  method  of  steepest  descent  a  new  estimate  of 
W( j)  equals  the  old  estimate  plus  a  term  proportional  to  the 
negative  of  the  gradient. 

Thus 


-  12  - 


W(  j+1  )  =  W(  j)  +  u  (-  V  (  j)  ) 


where  y  is  a  positive  constant  called  the  step  size. 
Subtracting  W*  from  both  sides  gives 


A  W{ j+1 )  =  (I-yR) A  W( j) 
or 

AW  (  j  )  =  (I-lJR)  3  A  W  (  0  ) 

By  diagonalizing  R  using  the  eigenvector  decomposition 
[6]  it  is  shown  that  for  convergence  it  is  necessary  that: 


1 

^max 


>y>0 


and  that  the  convergence  rate  time  constant  for  the  pth  tap 
weight  is  given  by 


T 


P 


_1 


where  A 

P 


is  the  pth  eigenvalue  of  R. 


W(  j+1)  =  W ( j )  -  Rj  V ( j : 

2 


Substituting  for  the  gradient  gives: 


W  (  j  +  1 )  =  W(j)  -  R~  (-2P+2.RW  (  j)  ) 
2 


W<  j  +  1 )  =  R“ 1 P  =  W* 


Thus  Newton's  method  converges  in  one  iteration.  This 
approach  is  also  referred  to  as  the  fast  start-up  equalizer 
18],  [11]. 


VI.  ITERATIVE  SOLUTIONS  BASED  ON  APPROXIMATIONS  TO  THE 
ENSEMBLE  AVERAGES 


A.  The  LMS  Algorithm  [5] ,  [6] 


In  the  LMS  Algorithm  the  method  of  steepest  descent  is 
used  with  the  ensemble  gradient  approximated  by  the 
instantaneous  gradient  given  by: 


=  9e2(j)_  .  _ 


2e  (j)V(j) 


It  can  shown  [5],  [6]  that  the  expected  value  of  the 
LMS  weight  vector  converges  to  the  Wiener  solution.  Using 
the  instantaneous  gradient  introduces  an  error  called 


gradient  noise  which  results  in  an  excess  mean  squared  error 
over  that  obtainable  with  the  Wiener  solution.  A  figure  of 
merit  for  the  estimation  process  is  defined  as 
misad j ustment .  It  is  equal  to  the  average  excess  mean 
squared  error  divided  by  the  minimum  mean  squared  error.  It 
can  be  shown  [6]  that  misad justment  is  equal  to: 

1  N  1 

M  =  l  —  =  ytrR,  tr  equals  trace 

*  i=l  Ti 

As  is  discussed  in  the  section  of  results, 
m isad j ustment  is  an  important  design  factor  in  noise 
suppression  since  large  mi  sad justment  manifest  itself  as  a 
pronounced  echo  in  the  speech  waveform.  The  echo  is  removed 
by  reducing  the  step  size,  and  thus  the  misad justment .  This 
reduction  of  course  conflicts  with  the  requirement  of  quick 

i 
i 
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settling  time  for  the  algorithm  which  can  be  shortened  by 
having  a  large  step  size.  The  trade-off  between 
m isad j ustment  and  settling  time  are  discussed  in  the  results 
sect i on . 


B.  Approximate  Newton 1 s  Method 

Newton's  method  can  be  approximated  just  as  the 
steepest  descent  method  by  replacing  the  ensemble  gradient 
with  the  i nstantaneous  gradient.  The  Newton's  method 
approximation  would  then  be: 

W  (  j+1 )  =  W(j)  +  o  R“ 1 e ( j) V  (  j ) 
where  a  is  the  normalized  step  size. 

Ignoring  for  the  moment  how  the  autocorrelation  matrix, 
R  would  be  calculated  and  inverted,  this  approach  offers 
certain  advantages  over  the  method  of  steepest  descent. 

The  convergence  properties  of  this  approach  can  be 
estimated  using  the  same  approach  as  with  the  method  of 
steepest  descent.  Specifically  replacing  the  noisy  gradient 
with  the  true  gradient  and  subtracting  the  optimal  weight 
vector  from  both  sides  of  the  above  expression  gives: 

AW (j+1)  =  (l-o)  AW ( j ) 
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Thus  for  convergence  it  is  necessary  that: 


2  >  o  '*  0 

The  adaptation  time  constant  would  be  the  same  for  each 
tap  weight  and  be  appropriately  equal  to: 


Using  a  diagonal ization  analysis  similar  to  that  used 
for  the  LMS  algorithm,  the  m isad j us tment  due  to  gradient 
noise  can  be  shown  to  be  approximately  equal  to: 


This  approach  to  tap  weight  estimation  has  the 
advantage  over  LMS  that  all  tap  weights  have  essentially  the 
same  adaptation  time  constant,  but  the  disadvantage  that  the 
gradient  estimate  must  be  multiplied  by  the  inverse  of  R  at 
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each  iteration. 


The  next  sections  discusses  how  the  tap  weights  can  be 
estimated  using  an  orthogonal  basis  which  has  an 

auto-correlation  matrix  that  is  diagonal.  With  this 
d iagonal i zat ion  the  number  of  operations  per  update  is 
linear  with  respect  to  the  number  of  tap  weights. 

C .  Orthoganalization  Us i ng  the  La  tt  ice  St  r  uc  t ur  e 
[7]  ,  [8]  ,  [9] 

To  generate  an  orthogonal  basis  for  estimating  the  tap 
weights  requires  a  transformat  ion  to  map  the  reference 

signal  (v(j-m)}  into  an  orthog®nal  signal  {g  ( j ) }  where: 

E  {9m(j)gk(j)l  =  6**^ 

The  lattice  structure  provides  this  transformation. 

The  mathematics  describing  this  o r thogo nal  i  za t i on  for 

stationary  reference  signals  is  well  developed  in  the  theory 

of  linear  prediction  of  speech  [12],  [13].  The  [g  (j)} 

m 

basis  called  the  backward  prediction  error  can  be  generated 
recursively  using  the  lattice  filter  structure: 

gm(3+1>  =  Vm-d3’  + 

fm(3)  *  £m-d3)  +  Vm-d3’ 

in  1,2,  •••fN 
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wh  e  i  e  : 


f Q  (  j  )  =  v  {  j ) 

gQ  ( j)  =  v( j-i ) 

The  sequence  {f  (j)}  is  called  the  forward  prediction  error 
m 

and  the  sequence  { k^}  are  known  as  either  k-par ameter s , 

reflection  coefficients  or  PARCOR  parameters.  It  can  be 

shown  that  if  k  is  estimated  to  minimize  the  forward 

m 

prediction  energy  u(m)  at  stage  m,  where: 

a(m)  =  E[fm(j) 

then  the  backward  prediction  sequence  will  be  orthogonal: 

E  (gm(3)9k(j))  =  6^5^ 

where: 


6  =  E  g2 ( j ) 

m  [_  m  J  _ 
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D .  Ad  a  p  t  i  v  e  Lattice  Algor  i  thm  [  7  ]  ,  [  8  ]  ,  [  9  ] 

To  generate  the  orthogonal  basis  needed  to  estimate  the 
Wiener  noise  cancelling  filter  with  Newton's  Method,  the 
k-parameters  must  first  be  estimated  to  minimize  the  forward 
prediction  error  energy.  These  k-parameters  can  then  be 
used  to  generate  the  backward  prediction  sequence  using  the 
lattice  structure.  Since  estimating  the  k-parameters  is 
just  another  least  squares  problem,  Newton's  method  is  used 
here  too. 

The  derivative  of  the  forward  prediction  energy  is 
given  by,  (i.e.the  i nstantaneous  gradient  in  orthogonal 
basis)  : 


3km 


The  adaptive  lattice  algorithm  is  then  defined  as: 


km(3  +  1>  "  km<3> 


(j) 


1 


(j) 
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where : 


Bm(j  +  D  =  (l-o)8m(j)  +  og*(j) 

c  is  the  normalized  step  size.  It  is  determined  by 
the  degree  of  relative  misad  justment  desired.  The  signals 
f  (j)  and  gm  (j)  are  generated  from  the  lattice  filter. 
The  vector  8  (j)  represents  a  single  poie  filtered  estimate 
of  the  average  backward  prediction  error  energy: 

«»U>  *  E{g'(j>> 

Dividing  by  8  (j)  in  the  orthogonal  basis  {g_(j)}  is 

m  *» 

equivalent  to  multiplying  by  R  '  in  the  {v(m-j)}  basis.  The 
vectors  {g  (j)}  will  approach  orthogonality  only  in  steady 


state . 


* 


E .  Ad  a  p  t i v  e  No i se  Filter  i n  Or  thoganal  Coordinates 

Define  H  as  the  N  x  1  noise  filter  vector  to  be 

estimated  in  the  {g  (j)}  basis.  The  output  error  at  the  mth 

m 

stage  is  given  by: 


Sm  (j)  = 
where:  sQ  (j)  =  x(j-l) 

Taking  the  derivative  of  the  prediction  error  gives  the 
expression  for  the  i nstantaneous  gradient  at  the  mth  stage: 


asi,  ( j ) 

W5—  =  -2S 

m 


The  adaptive  algorithm  is  then  defined  as: 

. , 

h^j+l)  -  hm(j)  +  1-2 


The  adaptive  filter  estimate  of  the  speech  waveform  is 
equal  to  sN  ( j) . 
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Figure  3  shows  the  composite  adaptive  noise  cancelling 


lattice  filter  algorithm. 


VII.  EXPERIMENTS  AND  RESULTS 
A.  Introduction 

A  controlled  data  base  was  generated  and  a  series  of 
experiments  were  conducted  to  determine  the  performance  of 
the  two  adaptive  estimation  methods  for  removing  noise  from 
speech.  A  two  input  signal  data  base  was  recorded  with  a 
high  degree  of  control  over  critical  environment  factors. 
The  expected  performance  of  the  algorithm  was  then  predicted 
using  a  digital  simulation  of  the  acoustic  environment.  The 
performance  of  the  LMS  and  adaptive  lattice  methods  were 
measured  in  terms  of  degree  of  noise  power  reduction, 
algorithm  settling  time  and  amount  of  echo  present.  These 
results  are  summarized  as  well  as  the  advantages  and 
limitations  of  each  approach. 


B 


Data  Base  Generation 


When  the  noise  added  to  the  speech  at  the  primary  input 
differs  from  the  noise  at  the  reference  input  by  a  single 
linear  stationary  system,  G]  (z)  the  adaptive  filter  will 
converge  to  this  linear  system  and  complete  noise 
cancellation  results  [14].  Referring  back  to  the  Wiener 
solution  development  given  in  Section  III,  this  type  of 
experiment  would  correspond  to  a  situation  where  the  added 
independent  noise  sources,  mx(j)  and  m2(j)  are  absent,  and 
G2  (z)  =  1.  Since  the  intent  of  this  paper  is  to  investigate 
the  degree  of  noise  suppression  possible  using  an  external 
correlated  input,  it  was  decided  to  construct  a  recording 
environment  as  close  as  possible  to  above  ideal  situation. 

An  acoustically  shielded  hard-walled  room  having  an 
ambient  noise  level  of  approximately  26dB  SPL  was  used  for 
recording  the  signals.  The  room  contains  audio  recording 
and  playback  equipment,  a  computer  terminal,  and  connections 
to  the  stereo  analog  to  digital  and  digital  to  analog 
converters.  The  acoustic  shielding  prevented  independent 
noise  (modeled  as  mx  (j)  and  m2(j))  from  interfering  in  the 
estimation  process. 
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A  stationary  white  noise  source  was  recorded  from  an 


analog  noise  generator  onto  audio  tape.  The  acoustic  noise 
was  generated  by  playing  the  audio  tape  out  through  a  loud 
speaker  into  the  room.  The  reference  signal  microphone  was 
placed  next  to  the  speaker,  while  the  primary  microphone  was 
placed  twelve  feet  away  next  to  the  control  terminal.  The 
speaker  spoke  into  the  primary  microphone  while  controlling 
the  stereo  recording  program.  The  noise  power  was  adjusted 
to  such  a  level  that  the  recorded  speech  was  completely 
masked.  The  signals  were  filtered  at  3.2kHz,  sampled  at 
6.67kHz,  and  quantized  to  fifteen  bits.  Recordings  were 
made  with  and  without  speech  present,  each  lasting  23.4  sec. 


C.  Dig ital  Simulation 


Before  processi 
described  above,  a 
conducted.  Two  estim 
available  from  a 
experiment  each  impul 
by  playing  an  elec 
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microphone  signals  were  digitized  and  stored  on  disk.  Each 
of  the  measured  room  impulse  responses  was  digitally 
convolved  with  the  digitized  white  noise  source  to  form  the 
primary  and  reference  inputs.  When  these  signals  were 
processed  through  the  LMS  algorithm  having  a  step  size 
corresponding  to  a  misad justment  setting  of  1%,  using  3000 
tap  weights,  the  noise  power  at  the  output  was  reduced  by 
12dB  after  23.4  seconds. 

The  experiment  points  out  some  of  the  problems  to 
contend  with  in  using  the  two  microphone  approach  for  noise 
suppression.  First  since  G2  (z)  is  not  an  identity,  the 
optimal  filter  must  approximate  G|(z)/G2(z).  A  long 
all-zero  filter  is  required  to  approximate  the  poles  induced 
by  G j 2  (z)  .  A  series  of  experiments  [16]  measuring  noise 
power  reduction  verses  filter  length  showed  that  3000  tap 
weights  with  a  1%  misad justment  setting  resulted  in  12  dB 
noise  reduction  after  23.4  seconds.  When  only  1000  tap 
weights  were  used  the  noise  power  was  reduced  by  6  dB  and 
when  500  tap  weights  were  used  the  noise  power  was  reduced 
by  only  4  dB.  Long  filter  lengths,  in-turn,  induce  more 
excess  mean  squared  error  and  increase  m isad justment .  The 
increased  misad justment  can  be  minimized  by  decreasing  the 
step  size,  but  at  the  expense  of  increasing  the  algorithm's 
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settling  time. 


The  second  problem  concerns  the  non-causal i ty  of  the 
estimated  filter.  There  is  no  guarantee  that  G2(z)  will  be 
minimum  phase  and  thus  a  stable  estimate  of  Gj  (z)/G?  (z) 
may  be  non-causal.  Non-causal  adaptive  filter  estimates  are 
easily  generated  by  placing  a  delay  into  the  primary  channel 
[5].  However  more  tap  weights  are  then  required  with  the 
accompanying  misad justment  problems  described  above.  Also, 
the  amount  of  delay  depends  on  the  microphone  placement  with 
respect  to  the  noise  source.  In  the  digital  simulation 
experiment  both  microphones  were  placed  approximately  eight 
feet  from  the  loudspeaker.  The  estimated  adaptive  filter 
impulse  response  then  required  a  delay  of  1500  points.  To 
minimize  this  non-causal  delay  requirement  for  the  acoustic 
experiment,  the  reference  microphone  was  moved  next  to  the 
loudspeaker.  As  is  seen  in  the  section  on  results,  placing 
the  reference  microphone  close  to  the  noise  source  removed 
the  non-casual  filter  effects. 

This  simulation  predicted  the  potential  performance 
achievable.  In  fact  considerably  better  performance  was 
measured  in  the  actual  acoustic  experiments  described  below. 
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D .  Results  Us i ng  th£  LMS  Algor i thm  on  the  Acoustic  Da ta 

The  algorithm's  performance  is  measured  in  terms  of  the 
degree  of  steady-state  noise  power  reduction  during 
non-speech  activity,  the  time  it  takes  to  reach  this  steady 
state  value,  (algorithm  settling  time),  and  the  amount  of 
echo  induced  when  speech  is  present.  The  first  two  factors 
were  measured  quantitatively  while  the  third  factor  was 
determined  from  listening  tests. 

Algorithm  settling  time  can  be  minimized  by  choosing  a 
large  step  size  value.  This  however  will  increase  the  echo 
present  in  the  speech  output  due  to  the  fact  that  the  output 
is  fed  back  when  estimating  the  tap  weights.  A  large  echo 
is  unacceptable  in  the  noise  suppression  algorithm.  Three 
experiments  were  conducted  to  measure  algorithm  settling 
time.  The  experiments  differed  by  the  amount  of 
misad justment  specified. 

Step  sizes  were  used  corresponding  to  misad justments  of 
1%,  5%,  and  10%.  Based  on  the  simulation  experiment, 
fifteen  hundred  tap  weights  were  used  for  estimating  the 
noise  filter.  The  results  of  steady-state  noise  reduction 
for  the  LMS  algorithm  are  shown  in  Figure  4.  The  results 
show  that  the  algorithm  converges  to  a  steady-state  noise 
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power  reduction  of  -20db  in  approximately  15  seconds  for  10% 
misad j ustment  and  21  seconds  for  5%  m isad justment .  At  1% 
misad j ustment  the  step  size  was  so  small  that  the  noise 
power  was  reduced  by  only  -lOdB  before  the  data  ran  out. 

In  listening  to  the  output  during  speech  activity  it 


was  judged 

that 

at  a 

10% 

m  isad  j  ustment 

s  e  1 1  i  ng 

unacceptabl e 

amount  of 

echo 

was 

present  and 

that  a 

setting  the 

echo 

was 

just 

noticeable. 

For 

m isad j ustment  setting  there  was  significant  noise 
suppression  and  corresponding  speech  enhancement.  At  the  1% 
misad justment  setting  the  output  had  a  noise  floor  which  was 
lOdB  higher  than  the  5%  and  10%  misad j ustment  outputs  due  to 
slow  settling  time.  To  illustrate  this  noise  suppression 
capability,  isometric  plots  of  time  versus  frequency 
magnitude  spectra  of  speech  with  and  without  noise 
suppression  are  shown  in  figures  5  and  fi.  The  plots  were 
constructed  by  computing  magnitude  spectra  from  54  half 
overlapped  hanning  windowed  data  sets.  Each  line  represents 
a  128-point  frequency  analysis.  Time  increases  from  bottom 
to  top  and  frequency  from  left  to  right.  Figure  5 
corresponds  to  the  unprocessed  speech  signal  "The  pipe  began 
to".  Figure  6  corresponds  to  the  processed  speech  signal 
using  a  5%  misad justment  step  size.  This  phrase  occurs  17.5 
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seconds  after  startup.  Finally  Figure  7  shows  the  noise 
shaping  filter,  W,  estimated  by  the  LMS  algorithm  after 
processing  23.4  seconds  of  noise  only  signal. 


E.  Results  Using  the  Adaptive  Lattice  Algor  i  thru 

A  similar  set  of  experiments  were  made  to  measure 
algorithm  settling  time  and  amount  of  echo  present  for  three 
representative  misad justment  step  sizes.  In  section  IV.  an 
approximate  expression  for  m isad j us tment  was  given  as: 


Step  sizes,  a  ,  corresponding  to  misad justments  of  10%,  3.3% 
and  1%  were  used.  The  convergence  characteristics  for  the 
algorithm  are  shown  in  figure  8. 

In  listening  to  the  output  during  speech  activity  it 
was  judged  that  at  the  10%  misad justment  setting  the  amount 
of  echo  present  was  unacceptable  (actually  worse  than  the 
10%  case  for  LMS),  that  at  3.3%  the  echo  was  just  noticeable 
(judged  equal  to  the  5%  case  for  LMS)  and  that  at  1%  there 
was  so  little  noise  reduction  that  echo  present,  if  any,  was 
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i r relevant  . 

For  the  10%  and  3.3%  m  isad  j  us  tmen  t  settings  there  was 
significant  noise  suppression  during  speech  activity. 
Figure  9  shows  the  time  verses  frequency  magnitude  spectrum 
of  the  output  of  adaptive  lattice  algorithm  at  the  3.3% 
misad j ustment  settings  for  the  same  speech  phrase. 

There  are  four  sets  of  filter  parameters  generated  by 

the  adaptive  lattice  algorithm.  Figure  10  shows  the  tap 

weights  H  in  the  orthogonal  basis  {g  ( j) }.  Figure  11  shows 

m 

the  k-parameters  for  the  lattice  filters  and  figure  12  shows 

the  average  backward  prediction  energies,  {g  }  .  The  '  s 

m  111 

are  not  strictly  manotonical ly  decreasing  due  to  the 
one-pole  digital  smoothing.  The  corresponding  tap  weights 
in  the  reference  signal  basis  can  be  obtained  by  multiplying 
the  H  vector  by  the  matrix  which  transforms  k-parameters 
into  the  linear  prediction  coefficients.  This  matrix  is 
defined  in  [12]  and  can  be  generated  by  the  STEPUP  procedure 
given  in  [12].  Figure  13  shows  the  tap  weights  obtained 
from  this  transfo rmat ion .  Each  of  these  parameter  sets  were 
recorded  at  the  end  of  the  23.4  second  noise  only  data 
segment  using  the  3.3%  misad justment  step  size.  Note  that 


the  filters  shown  in  Figures  7  and  13  are  quite  similar 


(differing  primarily  due  to  tap  weight  noise).  This  is  to 
be  expected  since  they  both  represent  estimates  of  the 
Wiener  filter  in  the  same  basis. 

VIII.  CONCLUSIONS 
A .  Compar ison  o f  Methods 

In  terms  of  noise  power  reduction  and  amount  of  echo 
present,  both  approaches  can  be  adjusted  to  give  equivalent 
results.  Using  step  sizes  corresponding  to  approximately  5% 
and  3.3%  misad justments ,  each  algorithm  converges  (noise 
power  down  20dB)  after  20  seconds  of  processing,  with  a  just 
noticeable  amount  of  echo.  The  adaptation  rates  are  not 
significantly  different.  These  equivalent  results  between 
the  two  methods  is  to  be  expected  since  the  reference  signal 
is  just  white  noise,  colored  by  the  room's  acoustics.  The 
averaged  backward  prediction  error  energies  and  the 
k-parameters  are  nearly  constant  after  the  first  one  hundred 
values.  Thus  for  this  environment,  the  normalization 
offered  by  the  gradient  lattice  offers  little  advantage  over 
LMS.  For  environments  with  a  large  ratio  between  the 
smallest  to  largest  eigenvalue,  the  gradient  lattice  method 
has  been  shown  to  converge  faster  f9). 


1 
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The  computational  price  payed  for  the  orthogonal iznt ion 
and  normal i zat ion  is  high  compared  to  the  LMS  approach.  The 
LMS  requires  2N  m ul t i pi y-adds  per  sample  while  the  gradient 
lattice  requires  ION  multiplies,  6N  adds,  and  ?N  divides  per 
point.  For  fifteen  hundred  tap  weights  at  a  sampling  rate 
of  6.67  kHz,  LMS  requires  20  million  mul ti pi y-adds  per 
second  and  gradient  lattice  requires  at  least  120  million 
mul ti pi y-add s  per  second  to  process  this  data  in  real  time. 
The  enormous  computation  requirement  necessitated 
implementing  both  algorithms  on  an  FPS  120-B  array 
processer .  These  micro-coded  implementations  resulted  in  a 
30  to  1  speed-up  over  that  achievable  on  a  conventional 
general  purpose  DEC-10  processor.  Both  algorithms  of  course 
still  did  not  run  in  real-time  but  were  processed  in  a 
non-real  time  disk  to  disk  configuration. 

In  addition  the  gradient  lattice  method  has  the 
disadvantage  that  it  requires  an  estimate  of  the  average 
backward  prediction  error  energy.  For  this  implementation 
these  estimates  were  obtained  by  smoothing  the  squared 
backward  prediction  values  through  a  single  pole  filter. 
For  nonstationary  reference  s’nnals  with  a  large  dynamic 
range,  this  smoothing  approach  may  be  unable  to  track  the 
gain  variations  thus  resulting  in  an  unstable  adaptive 
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filter. 


B .  Summary 


This 

pa  pe  r 

ad  dressed 

the  problem  of 

r  ed  uc  i  ng 

the 

acous  tic 

noise 

added  to 

speech  by 

subtracting  off 

an 

adaptively 

f i 1  ter ed  second 

cor rel ated 

no  ise 

so ur  ce . 

Two 

adaptive  algorithms  were  developed  and  their  performance 
char acter i st i cs  measured  using  an  acoustic  signal  in  which 
the  noise  power  was  equal  or  greater  than  the  speech  power. 
In  both  approaches  it  was  shown  that  significant  noise 
reduction  is  possible  with  minimal  distortion  to  the  speech 
wavefo  rm . 

In  summary,  though  this  two-microphone  approach  to 
noise  suppression  requires  a  second  signal  and  considerable 
computation,  it  offers  a  potentially  powerful  alternative 
approach  for  speech  enhancement  in  severe  noise 
env i ronments . 
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4.  Noise  Power  Reduction  versus  Processing  Time  for  the 
LMS  Algorithm  for  Misad justments  of  10%,  5%,  and  1%. 
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7.  Noise  Shaping  Filter,  W  Estimated  by  the  LMS  Algorithm 


8.  Noise  Power  Reduction  Versus  Processing  time  for  the 
Adaptive  Lattice  Algorithm  for  Misad justments  of  10%, 
3.31,  and  1%. 
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Short  Time  Spectrum  of  the  Lattice  Gradient  Output. 


1.  K-Parameter  Estimates  for  the  Lattice  Filter 
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Noise  Shaping  Filter,  W  Estimated  by  the  Lattice 
Gradient  Algorithm. 
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Abstract 

Certain  key  ideas  towards  the  development  of  a  linear 
narrow  band  digital  voice  analysis/synthesis  algorithm  which 
can  be  used  in  multiple  talker  and  conferencing 
environments,  are  presented.  The  use  of  articulation  rate 
change,  signal  extrapolation  (analytic  continuation)  and  2-D 
AGC  techniques  in  combination  is  discussed,  problems 
highlighted  and  current  results  presented  in  some  of  the 
areas.  This  approach  does  not  parameterize  speech  as  most 
narrow  band  vocoder  algorithms  do,  but  uses  data  compression 
ideas  on  the  speech  waveform  which  lends  it  the  property  of 
linearity  which  makes  it  suitable  for  use  in  conferencing 
and  multiple-talker  environments.  Also,  such  a  system  is 
expected  to  degrade  gracefully  with  noise. 


Introduction 
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i . 

Vocoder  algorithms  which  operate  at  rates  of  2.4  to  4.P 
kilobits/sec.  are  considered  narrowband.  These  vocoders 
such  as  LPC  vocoder,  channel  vocoder,  etc.  parameterize  the 
speech  signal,  attempting  to  extract  the  parameters  in  such 
a  way  that  a  good  fit  of  the  output  of  the  model  to  the 
actual  signal  is  obtained.  The  presence  of  noise  in  the 
speech  signal  leads  to  great  difficulty  in  extracting 
exactly  the  parameters  of  the  model.  Thus,  these  vocoders 
degrade  drastically  with  increased  noise  level.  Also,  they 
are  not  linear  because  of  the  parameterization  and  hence 
cannot  be  used  in  conferencing  environments. 

This  note  presents  several  ideas  which  in  combination 
point  to  the  possibility  of  developing  of  a  linear,  narrow 
band  voice  analysis/synthesis  algorithm  which  possesses  a 
graceful  degradation  characteristic  with  noise.  Because  of 
linearity,  the  algorithm  satisfies  the  superposition 
principle  and  hence  can  be  used  in  conferencing 
env i ronments . 

The  approach  considered  consists  of  the  following 
steps.  The  speech  signal  is  band  limited  and  transformed 
into  the  short  time  Fourier  domain.  Two-dimensional 
automatic  gain  control  (2-D  AGC)  is  then  applied  which 
results  in  a  modified  speech  signal  in  the  time  domain.  The 
number  of  bits  required  to  quantize  each  sample  of  this 
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signal  has  been  shown  by  Mike  Callahan  [1]  to  be  less  than 
half  of  what  is  required  by  log  PCM  techniques,  for  the  same 
quantization  noise  levels.  In  addition,  the  instantaneous 
phase  and  center  frequency  of  each  channel  in  the  short  time 
Fourier  domain  are  scaled  by  a  factor  ler.  than  unity  which 
leads  to  a  reduction  in  the  bandwidth  occupied  by  the 
resulting  signal.  Hence,  this  signal  requires  a  lower 
sampling  rate.  The  final  signal  in  the  time  domain  is 
divided  into  short  segments  and  only  one  in  every  few  (2  or 
3)  segments  is  transmitted.  At  the  receiver,  signal 
extrapolation  techniques  are  applied  to  recover  the  segments 
which  were  not  transmitted,  using  the  segments  transmitted. 
Then,  the  first  two  processes  are  inverted  to  realize  a 
signal  which  is  a  close  approximation  to  the  original. 
Since  no  parameter  extraction  is  involved  and  since  the 
coder  is  of  the  waveform  type,  the  algorithm  will  be  linear 
and  will  exhibit  graceful  degradation  with  noise. 

The  technique  for  articulation  rate  change  and  the 
problems  involved  are  discussed  in  Section  II.  In  section 
III,  signal  extrapolation  techniques  are  discussed.  Section 
IV  briefly  summarizes  the  results  of  Callahan  with  the  2-D 
AGC  experiments.  Problems  which  require  further  research 
are  highlighted  in  Section  V. 


II.  Articulation  Rate  Change  Techniques 


Theory : 


The  speech  signal  is  analysed  into  several  bands  in  the 
frequency  domain  using  a  Short-Time  Fourier  analyzer  or  a 
Constant-Q  analyzer.  The  instantaneous  phase  and  center 
frequency  of  each  channel  are  both  scaled  by  the  same  factor 
and  a  time  domain  signal  is  synthesized  from  the  resulting 
channel  signals.  The  process  defined  by  these  steps  leads 
to  the  scaling  of  the  bandwidth  of  the  synthesized  signal  by 
a  factor  equal  to  that  used  to  scale  the  phase  and  center 
frequencies.  To  be  used  as  a  bandwidth  compression 
technique,  a  scale  factor  less  than  unity  should  be  used  at 
the  sender  and  the  reciprocal  of  that  factor  at  the 
receiver.  Some  of  the  fundamental  limitations  and  other 
problems  associated  with  this  approach  to  bandwidth  scaling 
are  discussed  below. 


The  procedure  involves  dividing  a  speech  signal  into 
several  bandpass  signals  using  any  of  the  analysis 
techniques.  The  signal  in  the  nth  band,  fn(t)  >  can  be 
modeled  as  a  simultaneously  amplitude  and  angle  modulated 
wave  ( AAM)  as  below. 

fn(t)  -  qn(t)  Cos  (u>nt  +  <frn(t)) 


where: 


q  (t)  is  the  magnitude  signal  in  the  nth  channel, 
n 

<£n(t)  is  the  phase  signal  in  the  nth  channel  end  ^ 
is  the  center  frequency  of  the  nth  channel.  The  complete 
signal,  f(t)  is  given  by 
N-l 

f(t)  =  E  f  (t)  where  N  is  the  number  of  channels, 
n 


Kahn  and  Thomas  [2]  have  studied  the  bandwidth 
properties  of  such  signals  and  they  have  derived  the 
following  equation  for  the  instantaneous  bandwidth  of  the 
AAM  wave: 


2  .  I|q„(t)»n(t)||2 
q"  ll«n(t)l!2 


where  ftf  is  the  second  moment  bandwidth  of  the  signal  in 
n 

the  nth  channel. 


ft  is  the  second  moment  bandwidth  of 

^n 

signal  in  the  nth  channel. 


the  magnitude 


qn(t)  is  the  magnitude  signal  in  the  nth  channel. 

<J>n(t)  is  the  derivative  of  the  phase  signal  in  the  nth 

channel . 

II. I  I  is  the  norm  of  the  vector  in  the  function  space. 


Second  moment,  bandwidth 
f! 


Ensemble 


non-deterministic  signals. 


of  a  signal  f(t)  is 
averages  are 
It  is  clear  from 


defined  as 

used 

for 

the 

above 
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expression  that  scaling  <}>n(t)]eads  to  scaling  of  the  quantity 

.  But  the  amount  by  which  is  scaled  depends  on  the 

n  n 

relative  energies  in  the  magnitude  and  phase  signals. 

Linearity  results  only  when  ft  =  0,  which  in  general  is  not 

true.  So,  this  non-linearity  results  in  incorrect  scaling 

of  the  bandwidths  of  the  channel  signals  which  can  lead  to 

frequency  aliasing  on  bandwidth  compression  and 

reverberation  on  expansion.  This  we  will  call  the 

"Kahn-Thomas  effect".  Thus,  it  is  not  only  necessary  to 

scale  the  phase  signal  but  also  scale  the  bandwidth  of  the 

magnitude  signal  in  each  channel.  The  approach  considered 

in  this  research  is  to  apply  the  bandwidth  compression 

process  defined  above  recursively  to  each  channel  magnitude 

signal.  That  is,  each  channel  magnitude  signal  is  further 

analysed  into  subchannels,  each  subchannel  consisting  of  a 

magnitude  and  phase  signal.  Scaling  the  phase  of  the 

subchannels  leads  to  the  scaling  of  the  magnitude  signals  at 

the  next  higher  level.  This  idea  can  be  carried  further 

down  by  analyzing  the  subchannel  magnitude  signals  further 

into  narrower  channels  and  scaling  tne  phase  at  this  level. 

i  Of  course,  the  depth  of  recursion  is  limited  by  the 

difficulty  in  the  implementation  of  the  analysis  filters. 

► 

|  A  fundamental  limitation  arises  when  this  type  of 

■  bandwidth  compression  is  attempted.  The  technique  described 

I  attempts  to  discard  redundant  information  in  speech,  like 


l 


r; 

i 


causes  excessive  loss  of  information  in  each  assumed 
stationary  section  of  the  signal.  On  subsequent  expansion, 
the  lost  information  is  not  recovered.  This  type  of 
distortion  is  perceived  as  loss  of  voicing  in  voiced 
sections  of  speech. 

Implementation  and  Resul ts ; 

Three  different  analysis  techniques  were  used  to 
implement  the  rate  change.  In  each  case,  the  bandwidth  of  a 
speech  signal  was  compressed  and  re-expanded  and  the 
resulting  speech  compared  with  the  original.  The  three 
schemes  are  described  below. 

(a)  Using  a  Constant-Q  Analyzer: 

A  Constant-Q  filter  bank  was  used  in  this  scheme.  A 
Constant-Q  analyzer  has  a  frequency  resolution  which 
decreases  with  increasing  frequencies  somewhat  similar  to 
the  resolution  properties  of  the  ear,  whereas  a  short-time 
Fourier  analyzer  has  constant  frequency  resolution.  The 
distortions  were  severe  for  compression  factors  greater  than 
2.  Also  a  signal  dependent  background  noise  was  observed  in 
the  processed  signal  which  can  be  attributed  to  the 
Kahn-Thomas  effect.  Using  the  recursive  approach  described 
on  the  channel  magnitude  signals,  with  a  recursion  depth  of 
two,  it  was  found  that  the  signal  dependent  noise  was 
reduced  but  the  distortions  due  to  the  fundamental 
limitation  noted  above  still  prevailed. 
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(b)  Using  a  Filter  Bank  Made  Up  of  Constant  Bandwidth, 


Sharp  Cutoff  Filters. 

The  overlap  between  adjacent  channels  was  reduced  by 
using  this  type  of  filter  bank.  The  basic  quality  of 
processed  speech  remained  as  in  case  (a)  except  for  reduced 
background  noise. 

(c)  Using  a  Short  Time  Fourier  Analyzer 

In  this  case  a  narrow  band  analysis  system  was  used, 
and  essentially  similar  quality  results  were  obtained. 
Application  of  the  recursive  procedure  described  earlier  to 
compensate  for  the  Kahn-Thomas  effect  is  under  study. 

The  experiments  suggest  that  this  technique  can  be 
used,  without  introducing  serious  degradation  of  the  signal, 
with  compression  factors  less  than  2. 
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III.  Analytic  Continuation  of  Band  Limited  Signals 
or  Signal  Extrapolation 

An  analytic  signal  can  be  recovered  completely  given 
only  a  section  of  the  signal.  This  problem  can  be 
character i zed  as  follows.  Let  Pa  and  be  two  subspaces 
of  a  parent  Hilbert  Space  H.  If  the  projection  of  a  signal 
fe;Pb  ,  on  the  subspace  Pfl  is  given,  then  under  certain 
conditions,  it  is  possible  to  realize  an  inverse  operator  by 
recursive  techniques.  With  this  operator,  the  signal  f  can 
be  recovered  from  its  projection.  Papoulis  and  Gerchberg 
[3]  have  independently  proposed  similar  algorithms  based  on 
the  above  formulation.  They  attempt  to  obtain  the  signal  f 
when  P^  is  the  subspace  of  all  signals  band  limited  to  a 
particular  frequency  and  P  is  the  subspace  of  all  signals 

cl 

time  limited  to  a  particular  time  interval.  Youla  [4]  has 
shown  that  these  are  special  cases  of  a  more  general  problem 
of  solving  operator  equations  in  Hilbert  spaces  and  has 
derived  certain  important  conclusions.  He  shows  that  the 
problems  of  Papoulis  and  Gerchberg  are  not  well  posed.  He 
shows  and  we  have  found  that  applying  these  algorithms  to 
noisy  data  leads  to  serious  noise  amplification  problems. 
Richard  Frost  [5]  has  modified  the  above  to  come  up  with  a 
new  algorithm  which  performs  the  extrapolation  by  smell 
amounts  with  each  iteration  and  does  not  add  back  the 
distortion  energy  at  each  step  (Gerchberg's  algorithm  does 
add  back  the  distortion  energy  at  each  step).  This  makes 
the  algorithm  stable  in  the  presence  of  noise.  He  has 
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applied  it  to  the  restoration  of  astronomical  image  data  and 
proved  that  it  has  better  convergence  properties  in  the 
presence  of  noise.  Signal  extrapolation  is  expected  to  be 
harder  in  the  case  of  speech  signals  as  no  assumptions  can 
be  made  about  the  sign  of  the  signal  being  extrapolated. 
(This  is  always  non-negative  in  the  case  of  images.) 

In  speech  application,  the  algorithm  is  used  to  recover 
the  missing  signal  segment  between  two  successive  segments. 
So,  the  problem  can  be  characterized  as  follows:  The  signal 
f  belongs  to  the  subspace  of  band  limited  functions.  Its 
projections  onto  two  mutually  orthogonal  subspaces  are 
given.  These  subspaces  consist  of  functions  limited  over 
two  different  intervals  of  time.  The  problem  is  then  to 
find  the  projection  of  f  onto  a  third  subspace  of  functions 
which  is  orthogonal  to  both  the  given  subspaces.  The  two 
key  issues  to  be  addressed  are  the  stability  of 
reconstruction  in  the  presence  of  noise,  and  convergence 
rate.  Preliminary  experiments  with  Gerchberg's  algorithm 
seem  to  indicate  that  the  segments  of  the  speech  signal  must 
be  very  short  (much  less  than  a  pitch  period).  Currently, 
other  algorithms  based  on  Richard  Frost's  step  by  step 
extrapolation  and  the  three  orthogonal  subspaces  formulation 
derived  above  are  under  study. 


IV.  2-D  AGC  Techniques 


Mike  Callahan  [1]  has  developed  a  AGC  technique  to  be 
applied  in  the  short-time  Fourier  domain.  Essentially,  he 
models  the  short-time  Fourier  Transform  F(w,t)  of  a  speech 
signal  f(t)  as  the  product  of  an  envelope  function  E  (  oj  ,  t ) 
and  a  vibratory  function  V(w,t)  and  notes  that  E  is  slowly 
varying  and  positive,  and  V  is  fast  varying  and  complex. 

Then , 

1°9  [F  ( Wf  t)  ]  %  log  |  E  (a),t)  |  +  log  I V  (a)  ,t)  |  +  j  arg[V(w,t)] 
So,  passing  I  log  F(oi,t)  I  through  a  high  pass  filter  with  a 
low  pass  gain  of  p< 1  and  then  undoing  the  effect  of  the 
logarithm  leads  to  a  Short-time  Fourier  transform  given  by 
EPV.  The  time  signal  synthesized  from  EPV  is  the  original 
signal  with  its  dynamic  range  compressed.  Callahan  has 
shown  that  this  signal  can  be  quantized  with  2  to  3 
bits/sample  to  achieve  the  same  signal/quanti zation  noise 
ratio  as  with  ordinary  8-bit  PCM  techniques. 

This  technique  can  be  applied  to  achieve  reduced  bit 
rate  requirements  per  sample  of  speech  signal  independent  of 
the  techniques  described  in  the  previous  sections  which 
attempt  to  reduce  the  effective  sampling  rate.  Hence  it  may 
be  possible  to  use  the  compression  ideas  described  in  tandem 
to  achieve  low  bit  rates. 
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V.  Future  Work 


In  the  area  of  articulation  rate  change,  the  effect  of 
recursive  correction  for  the  Kahn-Thomas  effect,  when  using 
Short-Time  Fourier  analysis,  is  to  be  studied. 

Work  needs  to  be  dene  in  the  area  of  signal 
extrapolation  to  study  the  performance  of  various  existing 
algorithms  when  applied  to  speech  and  develop  modifications. 
More  research  needs  to  be  done  to  develop  new  algorithms  to 
suit  the  speech  application.  The  application  of  existing 
one  step  extrapolation  procedures  to  speech  reconstruction 
is  to  be  studied. 

In  all  the  cases,  work  is  required  to  better  condition 
the  problem  in  the  presence  of  noise  even  at  the  cost  of 
imperfect,  but  acceptable,  reconstruction  of  noise  free 
signals. 


VI 
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